Data Visualization Notes

This is a starter RMarkdown template to accompany Data Visualization (Princeton University Press, 2019). You can use it to take notes, write your code, and produce a good-looking, reproducible document that records the work you have done. At the very top of the file is a section of metadata, or information about what the file is and what it does. The metadata is delimited by three dashes at the start and another three at the end. You should change the title, author, and date to the values that suit you. Keep the output line as it is for now, however. Each line in the metadata has a structure. First the key (“title”, “author”, etc), then a colon, and then the value associated with the key.

This is an RMarkdown File

Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. A code chunk is a specially delimited section of the file. You can add one by moving the cursor to a blank line choosing Code > Insert Chunk from the RStudio menu. When you do, an empty chunk will appear:

Code chunks are delimited by three backticks (found to the left of the 1 key on US and UK keyboards) at the start and end. The opening backticks also have a pair of braces and the letter r, to indicate what language the chunk is written in. You write your code inside the code chunks. Write your notes and other material around them, as here.

Before you Begin

To install the tidyverse, make sure you have an Internet connection. Then manually run the code in the chunk below. If you knit the document if will be skipped. We do this because you only need to install these packages once, not every time you run this file. Either knit the chunk using the little green “play” arrow to the right of the chunk area, or copy and paste the text into the console window.

## This code will not be evaluated automatically.
## (Notice the eval = FALSE declaration in the options section of the
## code chunk)

my_packages <- c("tidyverse", "broom", "coefplot", "cowplot",
                 "gapminder", "GGally", "ggrepel", "ggridges", "gridExtra",
                 "here", "interplot", "margins", "maps", "mapproj",
                 "mapdata", "MASS", "quantreg", "rlang", "scales",
                 "survey", "srvyr", "viridis", "viridisLite", "devtools", "tibble")

install.packages(my_packages, repos = "http://cran.rstudio.com")

Set Up Your Project and Load Libraries

To begin we must load some libraries we will be using. If we do not load them, R will not be able to find the functions contained in these libraries. The tidyverse includes ggplot and other tools. We also load the socviz and gapminder libraries.

Notice that here, the braces at the start of the code chunk have some additional options set in them. There is the language, r, as before. This is required. Then there is the word setup, which is a label for your code chunk. Labels are useful to briefly say what the chunk does. Label names must be unique (no two chunks in the same document can have the same label) and cannot contain spaces. Then, after the comma, an option is set: include=FALSE. This tells R to run this code but not to include the output in the final document.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

gapminder
## # A tibble: 1,704 x 6
##    country     continent  year lifeExp      pop gdpPercap
##    <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
##  1 Afghanistan Asia       1952    28.8  8425333      779.
##  2 Afghanistan Asia       1957    30.3  9240934      821.
##  3 Afghanistan Asia       1962    32.0 10267083      853.
##  4 Afghanistan Asia       1967    34.0 11537966      836.
##  5 Afghanistan Asia       1972    36.1 13079460      740.
##  6 Afghanistan Asia       1977    38.4 14880372      786.
##  7 Afghanistan Asia       1982    39.9 12881816      978.
##  8 Afghanistan Asia       1987    40.8 13867957      852.
##  9 Afghanistan Asia       1992    41.7 16317921      649.
## 10 Afghanistan Asia       1997    41.8 22227415      635.
## # ... with 1,694 more rows

The remainder of this document contains the chapter headings for the book, and an empty code chunk in each section to get you started. Try knitting this document now by clicking the “Knit” button in the RStudio toolbar, or choosing File > Knit Document from the RStudio menu.

Look at Data

Get Started

library(tidyverse)
library(socviz)
url <- "https://cdn.rawgit.com/kjhealy/viz-organdata/master/organdonation.csv"
organs <- read_csv(file = url)
## Parsed with column specification:
## cols(
##   .default = col_double(),
##   country = col_character(),
##   world = col_character(),
##   opt = col_character(),
##   consent.law = col_character(),
##   consent.practice = col_character(),
##   consistent = col_character(),
##   ccode = col_character()
## )
## See spec(...) for full column specifications.
library(gapminder)
gapminder
## # A tibble: 1,704 x 6
##    country     continent  year lifeExp      pop gdpPercap
##    <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
##  1 Afghanistan Asia       1952    28.8  8425333      779.
##  2 Afghanistan Asia       1957    30.3  9240934      821.
##  3 Afghanistan Asia       1962    32.0 10267083      853.
##  4 Afghanistan Asia       1967    34.0 11537966      836.
##  5 Afghanistan Asia       1972    36.1 13079460      740.
##  6 Afghanistan Asia       1977    38.4 14880372      786.
##  7 Afghanistan Asia       1982    39.9 12881816      978.
##  8 Afghanistan Asia       1987    40.8 13867957      852.
##  9 Afghanistan Asia       1992    41.7 16317921      649.
## 10 Afghanistan Asia       1997    41.8 22227415      635.
## # ... with 1,694 more rows
p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp))
p + geom_point()

Make a Plot

library(gapminder)
p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp))
p + geom_smooth(method ="loess")+ geom_point(alpha = 0.3, mapping = aes(color = factor(year))) + scale_x_log10(labels = scales::dollar) + labs(x = "GDP Per Capita", y = "Life Expectancy in Years", title = "Economic Growth and Life Expectancy", subtitle = "Data points are country-years", caption = "Source: Gapminder.")

#ggsave(here("figures", "lifexp_vs_gdp_gradient.pdf"))

Show the Right Numbers

p <- ggplot(data = gapminder, mapping = aes(x = year, y = gdpPercap))
p + geom_line(aes(color = "gray70", group = country)) + geom_smooth(size = 1.1, method = "loess", se = FALSE) + scale_y_log10(labels = scales::dollar) + facet_wrap(~continent, ncol = 5) + labs(x = "Year", y = "GDP per capita", title = "GDP per capita on Five Continents")

p <- ggplot(data = gss_sm, mapping = aes( x = age, y = childs))
p + geom_point(alpha = 0.2) + geom_smooth() + facet_grid(sex ~ race)
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: Removed 18 rows containing non-finite values (stat_smooth).
## Warning: Removed 18 rows containing missing values (geom_point).

p <- ggplot(data = gss_sm, mapping = aes(x = bigregion))
p + geom_bar(mapping = aes(y = ..prop.., group = 1))

p <- ggplot(data = gss_sm, mapping = aes(x = religion, fill = religion))
p + geom_bar() + guides(fill = FALSE)

p <- ggplot(data = gss_sm, mapping = aes(x = religion))
p + geom_bar(position = "dodge", mapping = aes(y = ..prop.., group = bigregion))+
 facet_wrap(~bigregion, ncol = 2)

p <- ggplot(data = midwest, mapping = aes(x = area))
p + geom_histogram(bins = 10)

oh_wi <- c("OH", "WI")
p <- ggplot(data = subset(midwest, subset = state %in% oh_wi), mapping = aes(x = percollege, fill = state))
p + geom_histogram(alpha = 0.4, bins = 20)

p <- ggplot(data = midwest, mapping = aes(x = area))
p + geom_density()

p <- ggplot(data = midwest, mapping = aes(x = area, fill = state, color = state))
p + geom_density(alpha = 0.3)

p <- ggplot(data = subset(midwest,subset = state %in% oh_wi), mapping = aes(x = area, fill = state, color = state))
p + geom_density(alpha = 0.3, mapping = (aes(y = ..scaled..)))

p <- ggplot(data = titanic, mapping = aes(x = fate, y = percent, fill = sex))
p + geom_bar(position = "dodge", stat = "identity") + theme(legend.position ="top")

p <- ggplot(data = oecd_sum, mapping = aes(x = year, y = diff, fill = hi_lo))
p + geom_col() + guides(fill = FALSE) + labs(x = NULL, y = "Difference in Years", title = "The US Life Expectancy Gap", subtitle = "Difference between US and OECD average life expenctancies, 1960-2015", caption = ("Data: OECD, After a chart by Christopher Ingraham, Washington Post, December 27th, 2017."))
## Warning: Removed 1 rows containing missing values (position_stack).

p <- ggplot(data = midwest, mapping = aes(x = percbelowpoverty, y = percollege))
p + geom_density_2d() + geom_point() + geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Graph Tables, Make Labels, Add Notes

rel_by_region <- gss_sm %>%
  group_by(bigregion, religion) %>%
  summarize(N = n()) %>%
  mutate (freq = N / sum(N), pct = round((freq*100),0))
## Warning: Factor `religion` contains implicit NA, consider using
## `forcats::fct_explicit_na`
p <- ggplot(rel_by_region, aes(x = bigregion, y = pct, fill = religion))
p + geom_col(position = "dodge2") + labs(x = "Region", y = "Percent", fill = "Religion") + theme(legend.position = "top")

p <- ggplot(rel_by_region, aes(x = religion, y = pct, fill = religion))
p + geom_col(position = "dodge2") + labs(x = NULL, y = "Percent", fill = "Religion") + guides(fill = FALSE) + coord_flip() + facet_grid(~bigregion)

p <- ggplot(data = organdata, mapping = aes(x = year, y = donors))
p + geom_line(aes(group = country)) + facet_wrap(~country)
## Warning: Removed 34 rows containing missing values (geom_path).

p <- ggplot(data=organdata, mapping = aes(x = reorder(country, donors, na.rm = TRUE), y = donors, fill = world))
p + geom_boxplot() + labs(x = NULL) + coord_flip() + theme(legend.position = "top")
## Warning: Removed 34 rows containing non-finite values (stat_boxplot).

p <- ggplot(data = organdata, mapping = aes(x = reorder(country, donors, na.rm = TRUE), y = donors, color = world))
p + geom_jitter(position = position_jitter(width = 0.15)) + labs(x = NULL) + coord_flip() + theme(legend.position = "top")
## Warning: Removed 34 rows containing missing values (geom_point).

by_country <- organdata %>% group_by(consent_law, country) %>% summarize(donors_mean = mean(donors, na.rm = TRUE), donors_sd = sd(donors, na.rm = TRUE), gdp_mean = mean( gdp, na.rm = TRUE), health_mean = mean(health, na.rm = TRUE), roads_mean = mean(roads, na.rm = TRUE), cerebvas_mean = mean(cerebvas, na.rm = TRUE))

by_country <- organdata %>% group_by(consent_law, country) %>% summarize_if(is.numeric, funs(mean, sd), na.rm = 
                                  TRUE) %>% ungroup()
## Warning: funs() is soft deprecated as of dplyr 0.8.0
## Please use a list of either functions or lambdas: 
## 
##   # Simple named list: 
##   list(mean = mean, median = median)
## 
##   # Auto named with `tibble::lst()`: 
##   tibble::lst(mean, median)
## 
##   # Using lambdas
##   list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
## This warning is displayed once per session.
p <- ggplot(data = by_country, mapping = aes(x = donors_mean, y = reorder(country, donors_mean), color = consent_law))
p + geom_point(size = 3) + labs(x = "Donor Procurement Rate", y = "", color = "Consent Law") + theme(legend.position = "top")

p <- ggplot(data = by_country, mapping = aes(x = donors_mean, y = reorder(country, donors_mean)))
p + geom_point(size = 3) + facet_wrap(~consent_law, scales = "free_y", ncol = 1) + labs(x = "Donor Procurement Rate", y = "")

p <- ggplot(data = by_country, mapping = aes(x = reorder(country, donors_mean), y = donors_mean))
p + geom_pointrange(mapping = aes(ymin = donors_mean - donors_sd, ymax = donors_mean + donors_sd)) + labs(x = "Donor Procurement Rate", y = "") + coord_flip()

p <- ggplot(data = by_country, mapping = aes(x = roads_mean, y = donors_mean))
p + geom_point() + geom_text(mapping = aes(label = country),hjust = 0)

library(ggrepel)
p_title <- "Presidential Elections: Popular & Electoral College Margins"
p_subtitle <- "1824-2016"
p_caption <- "Data for 2016 are provisional"
x_label <- "Winner's share of Popular Vote"
y_label <- "Winner's share of Electoral College Vote"
p <- ggplot(elections_historic, aes(x = popular_pct, y = ec_pct, label = winner_label, color = win_party))
p + geom_hline(yintercept = 0.5, size = 1.4, color = "gray80") +
  geom_vline(xintercept = 0.5, size = 1.4, color = "gray80") +
  geom_point() +
  geom_text_repel() +
  scale_x_continuous(labels = scales::percent) +
  scale_y_continuous(labels = scales::percent) +
  labs(x = x_label, y = y_label, title = p_title, subtitle = p_subtitle, caption = p_caption)

p <- ggplot(data = by_country, mapping = aes(x = gdp_mean, y = health_mean))
p + geom_point() + geom_text_repel(data = subset(by_country, gdp_mean > 25000), mapping = aes(label = country))

p <- ggplot(data = by_country, mapping = aes(x = gdp_mean, y = health_mean))
p + geom_point() + geom_text_repel(data = subset(by_country, gdp_mean > 25000 | health_mean < 1500), mapping = aes(label = country))

organdata$ind <- organdata$ccode %in% c("Ita", "Spa") & organdata$year > 1998
p <- ggplot(data = organdata, mapping = aes(x = roads, y = donors, color = ind))
p + geom_point() + geom_text_repel(data = subset(organdata, ind), mapping = aes(label = ccode)) + guides(label = FALSE, color = FALSE)
## Warning: Removed 34 rows containing missing values (geom_point).

p <- ggplot2::ggplot(data = organdata, mapping = aes(x = roads, y = donors))
p + geom_point() + annotate(geom = "text", x = 91, y = 35, label = "A surprisingly high \n recovery rate.", hjust = 0)
## Warning: Removed 34 rows containing missing values (geom_point).

p <- ggplot2::ggplot(data = organdata, mapping = aes(x = roads, y = donors))
p + geom_point() + annotate(geom = "rect", xmin = 125, xmax = 155, ymin = 30, ymax = 35, fill = "red", alpha = 0.2) + annotate(geom = "text", x = 157, y = 33, label = "A surprisingly high \n recovery rate.", hjust = 0 )
## Warning: Removed 34 rows containing missing values (geom_point).

p <- ggplot2::ggplot(data = organdata, mapping = aes(x = roads, y = donors, color = world))
p + geom_point() + scale_x_log10() + scale_y_continuous(breaks = c(5, 15, 25), labels = c("Five", "Fifteen", "Twenty Five"))
## Warning: Removed 34 rows containing missing values (geom_point).

p <- ggplot2::ggplot(data = organdata, mapping = aes(x = roads, y = donors, color = world))
p + geom_point() + scale_color_discrete(labels = c("Corporatist", "Liberal", "Social Democratic", "Unclassified")) + labs(x = "Road Deaths", y = "Donor Procurement", color = "Welfare State") + theme(legend.position = "top")
## Warning: Removed 34 rows containing missing values (geom_point).

## Work with Models

p <- ggplot(data = gapminder, mapping = aes(x = log(gdpPercap), y = lifeExp))
p + geom_point(alpha = 0.1) + geom_smooth(color = "tomato", fill = "tomato", method = MASS::rlm) + geom_smooth(color = "steelblue", fill = "steelblue", method = "lm")

p + geom_point(alpha=0.1) + geom_smooth(color = "tomato", method = "lm", size = 1.2, formula = y ~ splines::bs(x, 3), se = FALSE)

p + geom_point(alpha=0.1) + geom_quantile(color = "tomato", size = 1.2, method = "rqss", lambda = 1, quantiles = c(0.20, 0.5, 0.85))
## Smoothing formula not specified. Using: y ~ qss(x, lambda = 1)

model_colors <- RColorBrewer::brewer.pal(3, "Set1")
model_colors
## [1] "#E41A1C" "#377EB8" "#4DAF4A"
p0 <- ggplot(data = gapminder, mapping = aes(x = log(gdpPercap), y = lifeExp))

p1 <- p0 + geom_point(alpha = 0.2) + geom_smooth(method = "lm", aes(color = "OLS", fill = "OLS")) + geom_smooth(method = "lm", formula = y ~ splines::bs(x, df = 3), aes(color = "Cubic Spline", fill = "Cubic Spline")) + geom_smooth(method = "loess", aes(color = "LOESS", fill = "LOESS"))
p1 + scale_color_manual(name = "Models", values = model_colors) + scale_fill_manual(name = "Models", values = model_colors) + theme(legend.position = "top")

out <- lm(formula = lifeExp ~ gdpPercap + pop + continent, data = gapminder)
summary(out)
## 
## Call:
## lm(formula = lifeExp ~ gdpPercap + pop + continent, data = gapminder)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -49.161  -4.486   0.297   5.110  25.175 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       4.781e+01  3.395e-01 140.819  < 2e-16 ***
## gdpPercap         4.495e-04  2.346e-05  19.158  < 2e-16 ***
## pop               6.570e-09  1.975e-09   3.326 0.000901 ***
## continentAmericas 1.348e+01  6.000e-01  22.458  < 2e-16 ***
## continentAsia     8.193e+00  5.712e-01  14.342  < 2e-16 ***
## continentEurope   1.747e+01  6.246e-01  27.973  < 2e-16 ***
## continentOceania  1.808e+01  1.782e+00  10.146  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.365 on 1697 degrees of freedom
## Multiple R-squared:  0.5821, Adjusted R-squared:  0.5806 
## F-statistic: 393.9 on 6 and 1697 DF,  p-value: < 2.2e-16
min_gdp <- min(gapminder$gdpPercap)
max_gdp <- max(gapminder$gdpPercap)
med_pop <- median(gapminder$pop)

pred_df <- expand.grid(gdpPercap = (seq(from = min_gdp, to = max_gdp, length.out = 100)), pop = med_pop, continent = c("Africa", "Americas", "Asia", "Europe", "Oceania"))
dim(pred_df)
## [1] 500   3
pred_out <- predict(object = out, newdata = pred_df, interval = "predict")
head(pred_out)
##        fit      lwr      upr
## 1 47.96863 31.54775 64.38951
## 2 48.48298 32.06231 64.90365
## 3 48.99733 32.57670 65.41797
## 4 49.51169 33.09092 65.93245
## 5 50.02604 33.60497 66.44711
## 6 50.54039 34.11885 66.96193
pred_df <- cbind(pred_df, pred_out)
head(pred_df)
##   gdpPercap     pop continent      fit      lwr      upr
## 1  241.1659 7023596    Africa 47.96863 31.54775 64.38951
## 2 1385.4282 7023596    Africa 48.48298 32.06231 64.90365
## 3 2529.6905 7023596    Africa 48.99733 32.57670 65.41797
## 4 3673.9528 7023596    Africa 49.51169 33.09092 65.93245
## 5 4818.2150 7023596    Africa 50.02604 33.60497 66.44711
## 6 5962.4773 7023596    Africa 50.54039 34.11885 66.96193
p <- ggplot(data = subset(pred_df, continent %in% c("Europe", "Africa")), aes(x = gdpPercap, y = fit, ymin = lwr, ymax = upr, color = continent, fill = continent, group = continent))
p + geom_point(data = subset(gapminder, continent %in% c("Europe","Africa")), aes(x = gdpPercap, y = lifeExp, color = continent), alpha = 0.5, inherit.aes = FALSE) + geom_line() + geom_ribbon(alpha = 0.2, color = FALSE) + scale_x_log10(labels = scales::dollar) + theme(legend.position = "top")

library(broom)
out_comp <- tidy(out)
out_comp %>% round_df()
## # A tibble: 7 x 5
##   term              estimate std.error statistic p.value
##   <chr>                <dbl>     <dbl>     <dbl>   <dbl>
## 1 (Intercept)          47.8      0.34     141.         0
## 2 gdpPercap             0        0         19.2        0
## 3 pop                   0        0          3.33       0
## 4 continentAmericas    13.5      0.6       22.5        0
## 5 continentAsia         8.19     0.570     14.3        0
## 6 continentEurope      17.5      0.62      28.0        0
## 7 continentOceania     18.1      1.78      10.2        0
p <- ggplot(out_comp, mapping = aes(x = term, y = estimate))
p + geom_point()+ coord_flip()

out_conf <- tidy(out, conf.int = TRUE)
out_conf %>% round_df()
## # A tibble: 7 x 7
##   term              estimate std.error statistic p.value conf.low conf.high
##   <chr>                <dbl>     <dbl>     <dbl>   <dbl>    <dbl>     <dbl>
## 1 (Intercept)          47.8      0.34     141.         0    47.2      48.5 
## 2 gdpPercap             0        0         19.2        0     0         0   
## 3 pop                   0        0          3.33       0     0         0   
## 4 continentAmericas    13.5      0.6       22.5        0    12.3      14.6 
## 5 continentAsia         8.19     0.570     14.3        0     7.07      9.31
## 6 continentEurope      17.5      0.62      28.0        0    16.2      18.7 
## 7 continentOceania     18.1      1.78      10.2        0    14.6      21.6
out_conf <- subset(out_conf, term %nin% "(Intercept)")
out_conf$nicelabs <- prefix_strip(out_conf$term, "continent")
p <- ggplot(out_conf, mapping = aes(x = reorder(nicelabs, estimate),y = estimate, ymin = conf.low, ymax = conf.high))
p + geom_pointrange() + coord_flip() + labs(x = "", y = "OLS Estimate")

out_aug <- augment(out, data = gapminder)
head(out_aug) %>% round_df()
## # A tibble: 6 x 13
##   country continent  year lifeExp    pop gdpPercap .fitted .se.fit .resid  .hat
##   <fct>   <fct>     <dbl>   <dbl>  <dbl>     <dbl>   <dbl>   <dbl>  <dbl> <dbl>
## 1 Afghan~ Asia       1952    28.8 8.43e6      779.    56.4    0.47  -27.6     0
## 2 Afghan~ Asia       1957    30.3 9.24e6      821.    56.4    0.47  -26.1     0
## 3 Afghan~ Asia       1962    32   1.03e7      853.    56.5    0.47  -24.5     0
## 4 Afghan~ Asia       1967    34.0 1.15e7      836.    56.5    0.47  -22.4     0
## 5 Afghan~ Asia       1972    36.1 1.31e7      740.    56.4    0.47  -20.3     0
## 6 Afghan~ Asia       1977    38.4 1.49e7      786.    56.5    0.47  -18.0     0
## # ... with 3 more variables: .sigma <dbl>, .cooksd <dbl>, .std.resid <dbl>
p <- ggplot(data = out_aug, mapping = aes(x = .fitted, y = .resid))
p + geom_point()

glance(out) %>% round_df()
## # A tibble: 1 x 11
##   r.squared adj.r.squared sigma statistic p.value    df logLik    AIC    BIC
##       <dbl>         <dbl> <dbl>     <dbl>   <dbl> <dbl>  <dbl>  <dbl>  <dbl>
## 1     0.580         0.580  8.37      394.       0     7 -6034. 12084. 12127.
## # ... with 2 more variables: deviance <dbl>, df.residual <dbl>
library(survival)
out_cph <- coxph(Surv(time, status) ~ age + sex, data = lung)
out_surv <- survfit(out_cph)
out_tidy <- tidy(out_surv)
p <- ggplot(data = out_tidy, mapping = aes(time, estimate))
p + geom_line() + geom_ribbon(mapping = aes(ymin = conf.low, ymax = conf.high), alpha = 0.2)

eu77 <- gapminder %>% filter(continent == "Europe", year == 1977)
fit <- lm(lifeExp ~ log(gdpPercap), data = eu77)
summary(fit)
## 
## Call:
## lm(formula = lifeExp ~ log(gdpPercap), data = eu77)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.4956 -1.0306  0.0935  1.1755  3.7125 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      29.489      7.161   4.118 0.000306 ***
## log(gdpPercap)    4.488      0.756   5.936 2.17e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.114 on 28 degrees of freedom
## Multiple R-squared:  0.5572, Adjusted R-squared:  0.5414 
## F-statistic: 35.24 on 1 and 28 DF,  p-value: 2.173e-06
out_le <- gapminder %>% group_by(continent, year) %>% nest()
out_le
## # A tibble: 60 x 3
## # Groups:   continent, year [60]
##    continent  year           data
##    <fct>     <int> <list<df[,4]>>
##  1 Asia       1952       [33 x 4]
##  2 Asia       1957       [33 x 4]
##  3 Asia       1962       [33 x 4]
##  4 Asia       1967       [33 x 4]
##  5 Asia       1972       [33 x 4]
##  6 Asia       1977       [33 x 4]
##  7 Asia       1982       [33 x 4]
##  8 Asia       1987       [33 x 4]
##  9 Asia       1992       [33 x 4]
## 10 Asia       1997       [33 x 4]
## # ... with 50 more rows
out_le %>% filter(continent == "Europe" & year == 1977) %>% unnest()
## # A tibble: 30 x 6
## # Groups:   continent, year [5]
##    continent  year country                lifeExp      pop gdpPercap
##    <fct>     <int> <fct>                    <dbl>    <int>     <dbl>
##  1 Europe     1977 Albania                   68.9  2509048     3533.
##  2 Europe     1977 Austria                   72.2  7568430    19749.
##  3 Europe     1977 Belgium                   72.8  9821800    19118.
##  4 Europe     1977 Bosnia and Herzegovina    69.9  4086000     3528.
##  5 Europe     1977 Bulgaria                  70.8  8797022     7612.
##  6 Europe     1977 Croatia                   70.6  4318673    11305.
##  7 Europe     1977 Czech Republic            70.7 10161915    14800.
##  8 Europe     1977 Denmark                   74.7  5088419    20423.
##  9 Europe     1977 Finland                   72.5  4738902    15605.
## 10 Europe     1977 France                    73.8 53165019    18293.
## # ... with 20 more rows
fit_ols <- function(df) {
  lm(lifeExp ~ log(gdpPercap), data = df)
}

out_tidy <- gapminder %>% group_by(continent, year) %>% nest() %>% mutate(model = map(data, fit_ols), tidied = map(model, tidy)) %>% unnest(tidied, .drop = TRUE) %>% filter(term %nin% "(Intercept)" & continent %nin% "Oceania")
# out_tidy %>% sample_n(5)
p <- ggplot(data = out_tidy, mapping = aes(x = year, y = estimate, ymin = estimate - 2*std.error, ymax = estimate + 2*std.error, group = continent, color = continent))
p + geom_pointrange(position = position_dodge(width = 1)) + scale_x_continuous(breaks = unique(gapminder$year)) + theme(legend.position = "top") + labs(x = "Year", y = "Estimate", color = "Continent")

library(margins)
gss_sm$polviews_m <- relevel(gss_sm$polviews, ref = "Moderate")
out_bo <- glm(obama ~ polviews_m + sex*race, family = "binomial", data = gss_sm)
summary(out_bo)
## 
## Call:
## glm(formula = obama ~ polviews_m + sex * race, family = "binomial", 
##     data = gss_sm)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.9045  -0.5541   0.1772   0.5418   2.2437  
## 
## Coefficients:
##                                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                       0.296493   0.134091   2.211  0.02703 *  
## polviews_mExtremely Liberal       2.372950   0.525045   4.520 6.20e-06 ***
## polviews_mLiberal                 2.600031   0.356666   7.290 3.10e-13 ***
## polviews_mSlightly Liberal        1.293172   0.248435   5.205 1.94e-07 ***
## polviews_mSlightly Conservative  -1.355277   0.181291  -7.476 7.68e-14 ***
## polviews_mConservative           -2.347463   0.200384 -11.715  < 2e-16 ***
## polviews_mExtremely Conservative -2.727384   0.387210  -7.044 1.87e-12 ***
## sexFemale                         0.254866   0.145370   1.753  0.07956 .  
## raceBlack                         3.849526   0.501319   7.679 1.61e-14 ***
## raceOther                        -0.002143   0.435763  -0.005  0.99608    
## sexFemale:raceBlack              -0.197506   0.660066  -0.299  0.76477    
## sexFemale:raceOther               1.574829   0.587657   2.680  0.00737 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2247.9  on 1697  degrees of freedom
## Residual deviance: 1345.9  on 1686  degrees of freedom
##   (1169 observations deleted due to missingness)
## AIC: 1369.9
## 
## Number of Fisher Scoring iterations: 6
bo_m <- margins(out_bo)
summary(bo_m)
##                            factor     AME     SE        z      p   lower
##            polviews_mConservative -0.4119 0.0283 -14.5394 0.0000 -0.4674
##  polviews_mExtremely Conservative -0.4538 0.0420 -10.7971 0.0000 -0.5361
##       polviews_mExtremely Liberal  0.2681 0.0295   9.0996 0.0000  0.2103
##                 polviews_mLiberal  0.2768 0.0229  12.0736 0.0000  0.2319
##   polviews_mSlightly Conservative -0.2658 0.0330  -8.0596 0.0000 -0.3304
##        polviews_mSlightly Liberal  0.1933 0.0303   6.3896 0.0000  0.1340
##                         raceBlack  0.4032 0.0173  23.3568 0.0000  0.3694
##                         raceOther  0.1247 0.0386   3.2297 0.0012  0.0490
##                         sexFemale  0.0443 0.0177   2.5073 0.0122  0.0097
##    upper
##  -0.3564
##  -0.3714
##   0.3258
##   0.3218
##  -0.2011
##   0.2526
##   0.4371
##   0.2005
##   0.0789
bo_gg <- as_tibble(summary(bo_m))
prefixes <- c("polviews_m", "sex")
bo_gg$factor <- prefix_strip(bo_gg$factor, prefixes)
bo_gg$factor <- prefix_replace(bo_gg$factor, "race", "Race: ")
bo_gg %>% select(factor, AME, lower, upper)
## # A tibble: 9 x 4
##   factor                     AME    lower   upper
##   <chr>                    <dbl>    <dbl>   <dbl>
## 1 Conservative           -0.412  -0.467   -0.356 
## 2 Extremely Conservative -0.454  -0.536   -0.371 
## 3 Extremely Liberal       0.268   0.210    0.326 
## 4 Liberal                 0.277   0.232    0.322 
## 5 Slightly Conservative  -0.266  -0.330   -0.201 
## 6 Slightly Liberal        0.193   0.134    0.253 
## 7 Race: Black             0.403   0.369    0.437 
## 8 Race: Other             0.125   0.0490   0.200 
## 9 Female                  0.0443  0.00967  0.0789
p <- ggplot(data = bo_gg, aes(x = reorder(factor, AME), y = AME, ymin = lower, ymax = upper))
p + geom_hline(yintercept = 0, color = "gray80") + geom_pointrange() + coord_flip() + labs(x = NULL, y = "Average Marginal Effect")

pv_cp <- cplot(out_bo, x = "sex", draw = FALSE)
##    xvals     yvals     upper     lower
## 1   Male 0.5735849 0.6378653 0.5093045
## 2 Female 0.6344507 0.6887845 0.5801169
p <- ggplot(data = pv_cp,aes(x = reorder(xvals, yvals),y = yvals, ymin = lower, ymax = upper))
p + geom_hline(yintercept = 0, color = "gray80") + geom_pointrange() + coord_flip() + labs(x = NULL, y = "Conditional Effect")

library(survey)
## Loading required package: grid
## Loading required package: Matrix
## 
## Attaching package: 'Matrix'
## The following objects are masked from 'package:tidyr':
## 
##     expand, pack, unpack
## 
## Attaching package: 'survey'
## The following object is masked from 'package:graphics':
## 
##     dotchart
library(srvyr)
## 
## Attaching package: 'srvyr'
## The following object is masked from 'package:stats':
## 
##     filter
options(survey.lonely.psu = "adjust")
options(na.action = "na.pass")

gss_wt <- subset(gss_lon, year > 1974) %>% mutate(stratvar = interaction(year, vstrat)) %>% as_survey_design(ids = vpsu, strata = stratvar, weights = wtssall, nest = TRUE)
out_grp <- gss_wt %>% filter(year %in% seq(1976, 2016, by = 4)) %>% group_by(year, race, degree) %>% summarize(prop = survey_mean(na.rm = TRUE))
out_grp
## # A tibble: 162 x 5
##     year race  degree            prop  prop_se
##    <dbl> <fct> <fct>            <dbl>    <dbl>
##  1  1976 White Lt High School  0.328   0.0160 
##  2  1976 White High School     0.518   0.0162 
##  3  1976 White Junior College  0.0129  0.00298
##  4  1976 White Bachelor        0.101   0.00960
##  5  1976 White Graduate        0.0393  0.00644
##  6  1976 White <NA>           NA      NA      
##  7  1976 Black Lt High School  0.562   0.0611 
##  8  1976 Black High School     0.337   0.0476 
##  9  1976 Black Junior College  0.0426  0.0193 
## 10  1976 Black Bachelor        0.0581  0.0239 
## # ... with 152 more rows
out_mrg <- gss_wt %>% filter(year %in% seq(1976, 2016, by = 4)) %>% mutate(racedeg = interaction(race, degree)) %>% group_by(year, racedeg) %>% summarize(prop = survey_mean(na.rm = TRUE))%>% separate(racedeg, sep = "\\.", into = c("race","degree"))
out_mrg
## # A tibble: 155 x 5
##     year race  degree            prop prop_se
##    <dbl> <chr> <chr>            <dbl>   <dbl>
##  1  1976 White Lt High School 0.298   0.0146 
##  2  1976 Black Lt High School 0.0471  0.00840
##  3  1976 Other Lt High School 0.00195 0.00138
##  4  1976 White High School    0.471   0.0160 
##  5  1976 Black High School    0.0283  0.00594
##  6  1976 Other High School    0.00325 0.00166
##  7  1976 White Junior College 0.0117  0.00268
##  8  1976 Black Junior College 0.00357 0.00162
##  9  1976 White Bachelor       0.0919  0.00888
## 10  1976 Black Bachelor       0.00487 0.00214
## # ... with 145 more rows
p <- ggplot(data = subset(out_grp, race %nin% "Other"), mapping = aes(x = degree, y = prop, ymin = prop - 2*prop_se, ymax = prop + 2*prop_se, fill = race, color = race, group = race))
dodge <- position_dodge(width = 0.9)
p + geom_col(position = dodge, alpha = 0.2) + geom_errorbar(position = dodge, width = 0.2) + scale_x_discrete(labels = scales::wrap_format(10)) + scale_y_continuous(labels = scales::percent) + 
  scale_color_brewer(type = "qual", palette = "Dark2") + scale_fill_brewer(type = "qual", palette = "Dark2") + labs(title = "Educational Attainment by Race", subtitle = "GSS 1976-2016", fill = "Race", color = "Race", x = NULL, y = "Percent") + facet_wrap(~year, ncol = 2) + theme(legend.position = "top")

p <- ggplot(data = subset(out_grp, race %nin% "Other"), mapping = aes(x = year, y = prop, ymin = prop - 2*prop_se, ymax = prop + 2*prop_se, fill = race, color = race, group = race))
p + geom_ribbon(alpha = 0.3, aes(color = NULL)) + geom_line() + facet_wrap(~degree, ncol = 5) + scale_y_continuous(labels = scales::percent) + scale_color_brewer(type = "qual", palette = "Dark2")+ scale_fill_brewer(type = "qual", palette = "Dark2") + labs(title = "Educational Attainment by Race", subtitle = "GSS 1976-2016", fill = "Race", color = "Race", x = NULL, y = "Percent") + theme(legend.position = "top")

out <- lm(formula = lifeExp ~ log(gdpPercap) + pop + continent, data = gapminder)
plot(out, which = c(1,2), ask = FALSE)

library(coefplot)
out <- lm( formula = lifeExp ~ log(gdpPercap) + log(pop) + continent, data = gapminder)
coefplot(out, sort = "magnitude", intercept = FALSE)

library(GGally)
## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2
## 
## Attaching package: 'GGally'
## The following object is masked from 'package:dplyr':
## 
##     nasa
organdata_sm <- organdata %>% select(donors, pop_dens, pubhealth, roads, consent_law)
ggpairs(data = organdata_sm, mapping = aes(color = consent_law), upper = list(continuous = wrap("density"), combo = "box_no_facet"),lower = list(continuous = wrap("points"), combo = wrap("dot_no_facet")))

Draw Maps

# Hex color codes are for Dem Blue and Rep Red
party_colors <- c("#2E74C0", "#CB454A")
p0 <- ggplot(data = subset(election, st %nin% "DC"), mapping = aes(x = r_points, y = reorder(state, r_points), color = party))
p1 <- p0 + geom_vline(xintercept = 0, color = "gray30") + geom_point(size = 2)
p2 <- p1 + scale_color_manual(values = party_colors)
p3 <- p2 + scale_x_continuous(breaks = c(-30, -20, -10, 0, 10, 20, 30, 40), labels = c("30\n(Clinton)", "20", "10", "0", "10", "20", "30", "40\n(Trump"))
p3 + facet_wrap(~census, ncol = 1, scales = "free_y") + guides(color = FALSE) + labs(x = "Point Margin", y = "") + theme(axis.text = element_text(size = 8))

library(maps)
## 
## Attaching package: 'maps'
## The following object is masked from 'package:purrr':
## 
##     map
library(ggthemes)

us_states <- map_data("state")
head(us_states)
##        long      lat group order  region subregion
## 1 -87.46201 30.38968     1     1 alabama      <NA>
## 2 -87.48493 30.37249     1     2 alabama      <NA>
## 3 -87.52503 30.37249     1     3 alabama      <NA>
## 4 -87.53076 30.33239     1     4 alabama      <NA>
## 5 -87.57087 30.32665     1     5 alabama      <NA>
## 6 -87.58806 30.32665     1     6 alabama      <NA>
dim(us_states)
## [1] 15537     6
p <- ggplot(data = us_states, mapping = aes(x = long, y = lat, group = group, fill = region))
p + geom_polygon(color = "Gray90", size = 0.1) + coord_map(projection = "albers", lat0 = 39 , lat1 = 45) + guides(fill = FALSE)

election$region <- tolower(election$state)
us_states_elec <- left_join(us_states, election)
## Joining, by = "region"
p <- ggplot(data = us_states_elec, aes(x = long, y = lat, group = group, fill = party))
p + geom_polygon(color = "Gray90", size = 0.1) + coord_map(projection = "albers", lat0 = 39, lat1 = 45)

p0 <- ggplot(data=us_states_elec, mapping=aes(x=long,y=lat,group=group,fill=party))
p1 <- p0+geom_polygon(color="gray90", size=0.1)+coord_map(projection="albers",lat0=39,lat1=45)
p2 <- p1+scale_fill_manual(values=party_colors)+labs(title="Election Result 2016",fill=NULL)
p2 + theme_map()

p0 <- ggplot(data=us_states_elec, mapping=aes(x=long, y=lat, group=group, fill= pct_trump))
p1 <- p0 + geom_polygon(color="Gray90", size = 0.1) + coord_map(projection = "albers", lat0 = 39, lat1 = 45)
p1 + labs(title = "Trump vote") + theme_map() + labs(fill = "Percent")

p2 <- p1 + scale_fill_gradient(low = "white", high = "#CB454A") + labs(title = "Trump vote")
p2 + theme_map() + labs(fill = "Percent")

p0 <- ggplot(data = us_states_elec, mapping = aes(x = long, y = lat, group = group, fill = d_points))
p1 <- p0 + geom_polygon(color = "gray90", size = 0.1) + coord_map(projection = "albers", lat0 = 39, lat1 = 45)
p2 <- p1 + scale_fill_gradient2() + labs(title = "Winning margins")
p2 + theme_map() + labs(fill = "Percent")

p3 <- p1 + scale_fill_gradient2(low = "red", mid = scales::muted("purple"), high = "blue", breaks = c(-25,0,25,50,75)) + labs(title = "Winning margins")
p3 + theme_map() + labs(fill = "Percent")

p0 <- ggplot(data = subset(us_states_elec, region %nin% "district of columbia"), mapping = aes(x = long, y = lat, group = group, fill = d_points))
p1 <- p0 + geom_polygon(color = "gray90", size = 0.1) + coord_map(projection = "albers", lat0 = 39, lat1 = 45)
p2 <- p1 + scale_fill_gradient2(low = "red", mid = scales::muted("purple"), high = "blue") + labs(title = "Winning margins")
p2 + theme_map() + labs(fill = "Percent")

county_map %>% sample_n(5)
##        long         lat  order  hole piece            group    id
## 1 1423245.7  -836424.48 149769 FALSE     1 0500000US47001.1 47001
## 2 2195588.5   337747.38 168474 FALSE     1 0500000US50009.1 50009
## 3  415517.4 -1378725.57 164893 FALSE     1 0500000US48467.1 48467
## 4  852096.7 -1140827.79  96706 FALSE     1 0500000US28027.1 28027
## 5 2376294.3    83260.63  84898 FALSE     1 0500000US25023.1 25023
county_data %>% select(id, name, state, pop_dens, pct_black) %>% sample_n(5)
##      id            name state      pop_dens   pct_black
## 1 48473   Waller County    TX [   50,  100) [25.0,50.0)
## 2 19055 Delaware County    IA [   10,   50) [ 0.0, 2.0)
## 3 39095    Lucas County    OH [ 1000, 5000) [15.0,25.0)
## 4 28029   Copiah County    MS [   10,   50) [50.0,85.3]
## 5 45000              41    SC [  100,  500) [25.0,50.0)
county_full <- left_join(county_map, county_data, by = "id")

p0 <- ggplot(data = county_full, mapping = aes(x = long, y = lat, group = group, fill = pop_dens))
p1 <- p0 + geom_polygon(color = "gray90", size = 0.5) + coord_equal()
p2 <- p1 + scale_fill_brewer(palette = "Blues", labels = c("0-10", "10-50", "50-100", "100-500", "500-1,000", "1,000-5,000", ">5,000"))
p2 + labs(fill = "Population per\nsquare mile") + theme_map() + guides(fill = guide_legend(nrow = 1)) + theme(legend.position = "bottom")

p0 <- ggplot(data = county_full, mapping = aes(x = long, y = lat, group = group, fill = pct_black))
p1 <- p0 + geom_polygon(color = "gray90", size = 0.5) + coord_equal()
p2 <- p1 + scale_fill_brewer(palette = "Greens")
p2 + labs(fill = "US Population, Percent Black") + guides(fill = guide_legend(nrow = 1)) + theme_map() + theme(legend.position = "bottom")

orange_pal <- RColorBrewer::brewer.pal(n = 6, name = "Oranges")
orange_pal
## [1] "#FEEDDE" "#FDD0A2" "#FDAE6B" "#FD8D3C" "#E6550D" "#A63603"
orange_rev <- rev(orange_pal)
orange_rev
## [1] "#A63603" "#E6550D" "#FD8D3C" "#FDAE6B" "#FDD0A2" "#FEEDDE"
gun_p <- ggplot(data = county_full, mapping = aes(x = long, y = lat, group = group, fill = su_gun6))
gun_p1 <- gun_p + geom_polygon(color = "gray90", size = 0.5) + coord_equal()
gun_p2 <- gun_p1 + scale_fill_manual(values = orange_pal)
gun_p2 + labs(title = "Gun-Related Suicides, 1999-2015", fill = "Rate per 100,000 pop.") + theme_map() + theme(legend.position = "bottom")

pop_p <- ggplot(data = county_full, mapping = aes(x = long, y = lat, group = group, fill = pop_dens6))
pop_p1 <- pop_p + geom_polygon(color = "gray90", size = 0.5) + coord_equal()
pop_p2 <- pop_p1 + scale_fill_manual(values = orange_rev)
pop_p2 + labs(title = "Reverse-coded Population Density", fill = "People per square mile") + theme_map() + theme(legend.position = "bottom")

library(statebins)
statebins_continuous(state_data = election, state_col = "state", text_color = "white", value_col = "pct_trump", brewer_pal = "Reds", font_size = 3, legend_title = "Percent Trump")
## Warning: `show_guide` has been deprecated. Please use `show.legend` instead.

statebins_continuous(state_data = subset(election, st %nin% "DC"), state_col = "state", text_color = "black", value_col = "pct_clinton", brewer_pal = "Blues", font_size = 3, legend_title = "Percent Clinton")
## Warning: `show_guide` has been deprecated. Please use `show.legend` instead.

election <- election %>% mutate(color = recode(party, Republican = "darkred", Democrat = "royalblue"))
statebins_manual(state_data = election, state_col = "st", color_col = "color", text_color = "white", font_size = 3, legend_title = "Winner", labels = c("Trump", "Clinton"), legend_position = "right")
## Warning: `show_guide` has been deprecated. Please use `show.legend` instead.

statebins(state_data = election, state_col = "state", value_col = "pct_trump", text_color = "white", breaks = 4, labels = c("4-21", "21-37", "37-53", "53-70"), brewer_pal = "Reds", font_size = 3, legend_title = "Percent Trump")
## Warning: `show_guide` has been deprecated. Please use `show.legend` instead.

opiates$region <- tolower(opiates$state)
opiates_map <- left_join(us_states, opiates)
## Joining, by = "region"
library(viridis)
## Loading required package: viridisLite
p0 <- ggplot(data = subset(opiates_map, year > 1999), mapping = aes(x = long, y = lat, group = group, fill = adjusted))
p1 <- p0 + geom_polygon(color = "gray90", size = 0.05) + coord_map(projection = "albers", lat0 = 39, lat1 = 45)
p2 <- p1 + scale_fill_viridis_c(option = "plasma")
p2 + theme_map() + facet_wrap(~year,ncol = 3) + theme(legend.position = "bottom", strip.background = element_blank()) + labs(fill = "Death rate per 100,000 population ", title = "Opiate Related Deaths by State, 2000-2014")

p <- ggplot(data = opiates, mapping = aes(x = year, y = adjusted, group = state))
p + geom_line(color = "gray70")
## Warning: Removed 17 rows containing missing values (geom_path).

p0 <- ggplot(data = drop_na(opiates, division_name), mapping = aes(x = year, y = adjusted))
p1 <- p0 + geom_line(color = "gray70", mapping = aes(group = state))
p2 <- p1 + geom_smooth(mapping = aes(group = division_name), se = FALSE)
p3 <- p2 + geom_text_repel(data = subset(opiates, year == max(year) & abbr !="DC"), mapping = aes(x = year, y = adjusted, label = abbr), 
                        size = 1.8, segment_color = NA, nudge_x = 30) + coord_cartesian(c(min(opiates$year), max(opiates$year)))
## Warning: Ignoring unknown parameters: segment_colour
p3 + labs(x = "", y = "Rate per 100,000 population", title = "State-Level Opiate Death Rates by Census Division, 1999-2014") + facet_wrap(~ reorder(division_name, -adjusted, na.rm = TRUE), ncol = 3)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: Removed 27 rows containing non-finite values (stat_smooth).
## Warning: Removed 17 rows containing missing values (geom_path).

Refine your Plots

head(asasec)
##                                Section         Sname Beginning Revenues
## 1      Aging and the Life Course (018)         Aging     12752    12104
## 2     Alcohol, Drugs and Tobacco (030) Alcohol/Drugs     11933     1144
## 3 Altruism and Social Solidarity (047)      Altruism      1139     1862
## 4            Animals and Society (042)       Animals       473      820
## 5             Asia/Asian America (024)          Asia      9056     2116
## 6            Body and Embodiment (048)          Body      3408     1618
##   Expenses Ending Journal Year Members
## 1    12007  12849      No 2005     598
## 2      400  12677      No 2005     301
## 3     1875   1126      No 2005      NA
## 4     1116    177      No 2005     209
## 5     1710   9462      No 2005     365
## 6     1920   3106      No 2005      NA
p <- ggplot(data = subset(asasec, Year == 2014), mapping = aes(x = Members, y = Revenues, label = Sname))
p + geom_point() + geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

p <- ggplot(data = subset(asasec, Year == 2014), mapping = aes(x = Members, y = Revenues, label = Sname))
p + geom_point(mapping = aes(color = Journal)) + geom_smooth(method = "lm")

p0 <- ggplot(data = subset(asasec, Year == 2014), mapping = aes(x = Members, y = Revenues, label = Sname))
p1 <- p0 + geom_point(mapping = aes(color = Journal)) + geom_smooth(method = "lm", se = FALSE, color = "gray80")
p2 <- p1 + geom_text_repel(data = subset(asasec, Year == 2014 & Revenues > 7000), size = 2)
p3 <- p2 + labs(x = "Membership", y = "Revenues", color = "Section has own Journal", title = "ASA Sections", subtitle = "2014 Calendar year.", caption = "Source: ASA annual report.")
p4 <- p3 + scale_y_continuous(labels = scales::dollar) + theme(legend.position = "top")
p4

p <- ggplot(data = organdata, mapping = aes(x = roads, y = donors, color = world))
p + geom_point(size = 2) + scale_color_brewer(palette = "Set2") + theme(legend.position = "top")
## Warning: Removed 46 rows containing missing values (geom_point).

p + geom_point(size = 2) + scale_color_brewer(palette = "Pastel2") + theme(legend.position = "top")
## Warning: Removed 46 rows containing missing values (geom_point).

p + geom_point(size = 2) + scale_color_brewer(palette = "Dark2") + theme(legend.position = "top")
## Warning: Removed 46 rows containing missing values (geom_point).

#Democratic Blue and Republican Red
party_colors <- c("#2E74C0", "#CB454A")

p0 <- ggplot(data = subset(county_data, flipped =="No"), mapping = aes(x = pop, y = black/100))
p1 <- p0 + geom_point(alpha = 0.15, color = "gray50") + scale_x_log10(labels = scales::comma)
p1

p2 <- p1 + geom_point(data = subset(county_data, flipped == "Yes"), mapping = aes(x=pop, y = black/100, color = partywinner16)) + scale_color_manual(values = party_colors)
p2

p3 <- p2 + scale_y_continuous(labels = scales::percent) + labs(color = "County flipped to ...", x = "County Population (log scale)", y = "Percent Black Population", title = "Flipped counties, 2016", caption = "Counties in gray did not flip.")
p3

p4 <- p3 + geom_text_repel(data = subset(county_data, flipped == "Yes" & black > 25), mapping = aes(x = pop, y = black/100, label = state), size = 2)
p4 + theme_minimal() + theme(legend.position = "top")

theme_set(theme_bw())
p4 + theme(legend.position = "top")

theme_set(theme_dark())
p4 + theme(legend.position = "top")

library(ggthemes)
theme_set(theme_economist())
p4 + theme(legend.position = "top")

theme_set(theme_wsj())
p4 + theme(plot.title = element_text(size = rel(0.6)),
           legend.title = element_text(size = rel(0.35)),
           plot.caption = element_text(size = rel(0.35)),
           legend.position = "top")

yrs <- c(seq(1972,1988, 4), 1993, seq(1996,2016,4))

mean_age <- gss_lon %>% filter(age %nin% NA && year %in% yrs) %>% group_by(year) %>% summarize(xbar = round(mean(age, na.rm = TRUE), 0))
mean_age$y <- 0.3
library(cowplot)
## 
## ********************************************************
## Note: As of version 1.0.0, cowplot does not change the
##   default ggplot2 theme anymore. To recover the previous
##   behavior, execute:
##   theme_set(theme_cowplot())
## ********************************************************
## 
## Attaching package: 'cowplot'
## The following object is masked from 'package:ggthemes':
## 
##     theme_map
library(hrbrthemes)
## NOTE: Either Arial Narrow or Roboto Condensed fonts are required to use these themes.
##       Please use hrbrthemes::import_roboto_condensed() to install Roboto Condensed and
##       if Arial Narrow is not on your system, please see http://bit.ly/arialnarrow
yr_labs <- data.frame(x = 85, y = 0.8, year = yrs)
p <- ggplot(data = subset(gss_lon, year %in% yrs), mapping = aes(x = age))
p1 <- p + geom_density(fill = "gray20", color = FALSE, alpha = 0.9, mapping = aes(y = ..scaled..)) + 
  geom_vline(data = subset(mean_age, year %in% yrs), aes(xintercept = xbar), color = "white", size = 0.5) +
  geom_text(data = subset(mean_age, year %in% yrs), aes(x = xbar, y = y, label = xbar), nudge_x = 7.5, color = "white", size = 3.5, hjust = 1) + 
  geom_text(data = subset(yr_labs, year %in% yrs), aes(x = x, y = y, label = year)) + facet_grid(year ~ ., switch = "y")
theme_set(theme_base())
p1 + theme(legend.position = "top")
## Warning: Removed 83 rows containing non-finite values (stat_density).

library(ggridges)
## 
## Attaching package: 'ggridges'
## The following object is masked from 'package:ggplot2':
## 
##     scale_discrete_manual
p <- ggplot(data = gss_lon, mapping = aes(x = age, y = factor(year, levels = rev(unique(year)), ordered = TRUE)))
p + geom_density_ridges(alpha = 0.6, fill = "lightblue", scale = 1.5) + scale_x_continuous(breaks = c(25,50,75)) + 
  scale_y_discrete(expand = c(0.01,0)) + 
  labs(x = "Age", y = NULL,
       title = "Age Distribution of\nGSS Respondents") + 
  theme_ridges() + 
  theme(title = element_text(size = 16, face = "bold"))
## Picking joint bandwidth of 3.49
## Warning: Removed 221 rows containing non-finite values (stat_density_ridges).

head(fredts)
##         date  sp500 monbase  sp500_i monbase_i
## 1 2009-03-11 696.68 1542228 100.0000  100.0000
## 2 2009-03-18 766.73 1693133 110.0548  109.7849
## 3 2009-03-25 799.10 1693133 114.7012  109.7849
## 4 2009-04-01 809.06 1733017 116.1308  112.3710
## 5 2009-04-08 830.61 1733017 119.2240  112.3710
## 6 2009-04-15 852.21 1789878 122.3245  116.0579
fredts_m <- fredts %>% select(date, sp500_i, monbase_i) %>% gather(key = series, value = score, sp500_i:monbase_i)
head(fredts_m)
##         date  series    score
## 1 2009-03-11 sp500_i 100.0000
## 2 2009-03-18 sp500_i 110.0548
## 3 2009-03-25 sp500_i 114.7012
## 4 2009-04-01 sp500_i 116.1308
## 5 2009-04-08 sp500_i 119.2240
## 6 2009-04-15 sp500_i 122.3245
p <- ggplot(data = fredts_m, mapping = aes(x = date, y = score, group = series, color = series))
p1 <- p + geom_line() + theme(legend.position = "top") + 
  labs(x = "Date",
       y = "Index",
       color = "Series")
p <- ggplot(data = fredts, mapping = aes(x=date, y = sp500_i - monbase_i))
p2 <- p + geom_line() + labs(x = "Date",
                             x = "Difference")
cowplot::plot_grid(p1, p2, nrow = 2, rel_heights = c(0.75, 0.25), align = "v")
## Don't know how to automatically pick scale for object of type ts. Defaulting to continuous.
## Don't know how to automatically pick scale for object of type ts. Defaulting to continuous.

head(yahoo)
##   Year Revenue Employees Mayer
## 1 2004    3574      7600    No
## 2 2005    5257      9800    No
## 3 2006    6425     11400    No
## 4 2007    6969     14300    No
## 5 2008    7208     13600    No
## 6 2009    6460     13900    No
p <- ggplot(data = yahoo, mapping = aes(x = Employees, y = Revenue))
p + geom_path(color = "gray80") + 
  geom_text(aes(color = Mayer, label = Year), 
            size = 3, fontface = "bold") + 
  theme(legend.position = "bottom") + 
  labs(color = "Mayer is CEO", x = "Employees", y ="Revenue (Millions)",
       title = "Yahoo Employees vs Revenues, 2004-2014") + 
  scale_y_continuous(labels = scales::dollar) + 
  scale_x_continuous(labels = scales::comma)

p <- ggplot(data = yahoo, mapping = aes(x = Year, y = Revenue/Employees))
p + geom_line(color = "gray60", size = 2) + 
  geom_vline(xintercept = 2012) + 
  annotate("text", x = 2013, y = 0.44, label = " Mayer becomes CEO", size = 2.5) + 
  labs(x = "Year\n", y = "Revenue/Employees",
       title = "Yahoo Revenue to Employee Ratio, 2004-2014")

head(studebt)
## # A tibble: 6 x 4
##   Debt     type        pct Debtrc  
##   <ord>    <fct>     <int> <ord>   
## 1 Under $5 Borrowers    20 Under $5
## 2 $5-$10   Borrowers    17 $5-$10  
## 3 $10-$25  Borrowers    28 $10-$25 
## 4 $25-$50  Borrowers    19 $25-$50 
## 5 $50-$75  Borrowers     8 $50-$75 
## 6 $75-$100 Borrowers     3 $75-$100
p_xlab <- "Amount Owed, in thousands of Dollars"
p_title <- "Outstanding Student Loans"
p_subtitle <- "44 million borrowers owe a total of $1.3 trillion"
p_caption <- "Source: FRB NY"

f_labs <- c(`Borrowers` = "Percent of \nall Borrowers",
            `Balances` = "Percent of \nall Balances")

p <- ggplot(data = studebt, mapping = aes(x = Debt, y = pct/100, fill = type))
p +  geom_bar(stat = "identity") + scale_fill_brewer(type = "qual", palette = "Dark2") + 
  scale_y_continuous(labels = scales::percent) + 
  guides(fill = FALSE) + 
  theme(strip.text.x = element_text(face = "bold")) + 
  labs(y = NULL, x = p_xlab,
       caption = p_caption,
       title = p_title,
       subtitle = p_subtitle) + 
  facet_grid(~type, labeller = as_labeller(f_labs)) + 
  coord_flip()

library(viridis)

p <- ggplot(studebt, aes(y = pct/100, x = type, fill = Debtrc))
p + geom_bar(stat = "identity", color = "gray80") + 
  scale_x_discrete(labels = as_labeller(f_labs)) + 
  scale_y_continuous(labels = scales::percent) + 
  scale_fill_viridis(discrete = TRUE) + 
  guides(fill = guide_legend(reverse = TRUE,
                             title.position = "top",
                             label.position = "bottom",
                             keywidth = 3,
                             nrow = 1)) + 
  labs(x = NULL, y = NULL,
       fill = "Amount Owed, in thousands of dollars",
       caption = p_caption,
       title = p_title,
       subtitle = p_subtitle) + 
  theme(legend.position = "top",
        axis.text.y = element_text(face = "bold", hjust = 1, size = 12),
        axis.ticks.length = unit(0, "cm"),
        panel.grid.major.y = element_blank()) + 
  coord_flip()